Fix qwen3 30b by gaugarg-nv · Pull Request #8 · JohannesGaessler/llama.cpp

gaugarg-nv · 2026-02-24T11:36:05Z

Qwen-30B-A3B Q4_0 has an intermediate dimension of 768. Using a granularity of 256 forces an uneven split between GPUs, which is not supported by the current implementation.

* Fix crash with Qwen-30B-A3B Q4_0 Qwen-30B-A3B Q4_0 has an intermediate dimension of 768. Using a granularity of 256 forces an uneven split between GPUs, which is not supported by the current implementation. * Decide block size based on tensor quantization type

Fix crash with Qwen-30B-A3B Q4_0

b9ed7a9

Qwen-30B-A3B Q4_0 has an intermediate dimension of 768. Using a granularity of 256 forces an uneven split between GPUs, which is not supported by the current implementation.

gaugarg-nv requested a review from JohannesGaessler as a code owner February 24, 2026 11:36

gaugarg-nv changed the base branch from master to ggml-meta-backend-8 February 24, 2026 11:36

gaugarg-nv mentioned this pull request Feb 24, 2026

ggml: backend-agnostic tensor parallelism ggml-org/llama.cpp#19378

Open

Decide block size based on tensor quantization type

bbcbae2

JohannesGaessler merged commit 87e172b into JohannesGaessler:ggml-meta-backend-8 Feb 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix qwen3 30b#8

Fix qwen3 30b#8
JohannesGaessler merged 2 commits intoJohannesGaessler:ggml-meta-backend-8from
gaugarg-nv:fix_qwen3_30b

gaugarg-nv commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gaugarg-nv commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants